Chapter 2.4: String Recursion
Strings as Recursive Structures
Strings as Recursive Structures
Before we dive into recursive string processing, we need to shift our mental model. You've been thinking of strings as atomic unitsβsingle objects you manipulate with methods like .upper() or .split(). But for recursion, we need to see strings the same way we see lists: as sequences that can be decomposed into first + rest.
The Character-by-Character View
A string is a sequence of characters. Just like a list can be split into first and rest, a string can be split into its first character and the remaining substring:
text = "hello"
first = text[0] # 'h'
rest = text[1:] # 'ello'
This decomposition is the foundation of all string recursion. Every recursive string function will follow this pattern:
- Base case: What do we do with an empty string
""? - Recursive case: Process the first character, recurse on the rest
Why This Matters
String recursion appears constantly in real-world programming:
- Validation: Email addresses, phone numbers, passwords
- Parsing: Configuration files, markup languages, expressions
- Pattern matching: Search, replace, tokenization
- Transformation: Encoding, encryption, formatting
Let's build our understanding through a concrete problem that will evolve throughout this chapter.
Reference Implementation: Email Validation
Reference Implementation: Email Validation
Our anchor example for this chapter will be validating email addresses. This is a real-world problem that naturally exposes the strengths and limitations of recursive string processing.
The Problem
We need to validate that a string is a properly formatted email address. For our purposes, a valid email has this structure:
username@domain.extension
Rules:
- Username: 1+ alphanumeric characters or dots
- Must contain exactly one @ symbol
- Domain: 1+ alphanumeric characters or dots
- Must contain exactly one . after the @
- Extension: 2+ alphabetic characters after the final .
Examples:
- β "user@example.com"
- β "john.doe@company.co.uk"
- β "invalid" (no @)
- β "user@@example.com" (multiple @)
- β "user@.com" (empty domain)
Iteration 0: The Naive Recursive Approach
Let's start with the most straightforward recursive implementationβchecking character by character:
def is_valid_email(email):
"""
Validate email format recursively.
Returns True if valid, False otherwise.
"""
# Base case: empty string is invalid
if len(email) == 0:
return False
# Check if we have exactly one @ symbol
if email[0] == '@':
# Found @, now validate the rest as domain
return has_valid_domain(email[1:])
# First character should be alphanumeric or dot
if email[0].isalnum() or email[0] == '.':
# Continue checking the rest
return is_valid_email(email[1:])
# Invalid character found
return False
def has_valid_domain(domain):
"""
Validate domain part (after @).
Must have format: name.extension
"""
# Base case: empty domain is invalid
if len(domain) == 0:
return False
# Check if we have a dot
if domain[0] == '.':
# Found dot, validate extension
return has_valid_extension(domain[1:])
# Domain character should be alphanumeric or dot
if domain[0].isalnum() or domain[0] == '.':
return has_valid_domain(domain[1:])
return False
def has_valid_extension(extension):
"""
Validate extension part (after final dot).
Must be 2+ alphabetic characters.
"""
# Base case: need at least 2 characters
if len(extension) < 2:
return False
# All characters must be alphabetic
if extension[0].isalpha():
if len(extension) == 1:
return True # Last character
return has_valid_extension(extension[1:])
return False
# Test the implementation
test_cases = [
"user@example.com",
"john.doe@company.co",
"invalid",
"user@@example.com",
"@example.com",
"user@.com",
"user@example.",
"user@examplecom"
]
print("Testing Iteration 0:")
for email in test_cases:
result = is_valid_email(email)
print(f"{email:25} β {result}")
Python Output:
Testing Iteration 0:
user@example.com β True
john.doe@company.co β True
invalid β False
user@@example.com β False
@example.com β False
user@.com β False
user@example. β False
user@examplecom β False
What Works
The basic structure is sound: - Decomposes the string character by character - Handles the three-part structure (username, domain, extension) - Correctly rejects obviously invalid formats
Current Limitations
But let's test some edge cases:
# Edge case testing
edge_cases = [
"user.name@example.com", # Multiple dots in username
"user@sub.domain.com", # Multiple dots in domain
"user@example.co.uk", # Multiple dots in extension
".user@example.com", # Leading dot
"user.@example.com", # Trailing dot before @
"user@.example.com", # Leading dot in domain
]
print("\nTesting edge cases:")
for email in edge_cases:
result = is_valid_email(email)
print(f"{email:25} β {result}")
Python Output:
Testing edge cases:
user.name@example.com β True
user@sub.domain.com β False β WRONG! Should be True
user@example.co.uk β False β WRONG! Should be True
.user@example.com β True β WRONG! Should be False
user.@example.com β True β WRONG! Should be False
user@.example.com β False β Correct (accidentally)
Diagnostic Analysis: Understanding the Failures
Let's trace what happens with "user@sub.domain.com":
Manual Trace:
is_valid_email("user@sub.domain.com")
β 'u' is alnum, recurse on "ser@sub.domain.com"
β 's' is alnum, recurse on "er@sub.domain.com"
β 'e' is alnum, recurse on "r@sub.domain.com"
β 'r' is alnum, recurse on "@sub.domain.com"
β '@' found! Call has_valid_domain("sub.domain.com")
has_valid_domain("sub.domain.com")
β 's' is alnum, recurse on "ub.domain.com"
β 'u' is alnum, recurse on "b.domain.com"
β 'b' is alnum, recurse on ".domain.com"
β '.' found! Call has_valid_extension("domain.com")
has_valid_extension("domain.com")
β 'd' is alpha, recurse on "omain.com"
β 'o' is alpha, recurse on "main.com"
β 'm' is alpha, recurse on "ain.com"
β 'a' is alpha, recurse on "in.com"
β 'i' is alpha, recurse on "n.com"
β 'n' is alpha, recurse on ".com"
β '.' is NOT alpha! Return False β PROBLEM!
Root cause identified: Our has_valid_extension() function treats the first dot it encounters as the start of the extension. But "sub.domain.com" has multiple dotsβwe need to find the last dot, not the first.
Why the current approach can't solve this: Character-by-character left-to-right processing can't distinguish between "intermediate dots" and "the final dot before the extension."
What we need: A way to identify structural boundaries in the string before processing character by character.
Limitation Preview
This reveals a fundamental issue with pure character-by-character recursion: we need to understand the string's structure before we can validate it. In the next iteration, we'll introduce a preprocessing step that identifies key positions (like the @ and the final .) before recursing.
Iteration 1: Structure-First Validation
Iteration 1: Structure-First Validation
Current State Recap
Our Iteration 0 validator works for simple cases but fails when strings have multiple dots because it processes left-to-right without understanding the overall structure.
New Scenario: Structural Validation
What if we first find the key positions (@ and final .), then validate each section independently?
Let's implement this approach:
def is_valid_email_v2(email):
"""
Validate email by first identifying structure, then validating parts.
"""
# Find the @ symbol position
at_pos = find_char_position(email, '@', 0)
if at_pos == -1:
return False # No @ found
# Check for multiple @ symbols
if find_char_position(email, '@', at_pos + 1) != -1:
return False # Multiple @ symbols
# Split into username and domain
username = email[:at_pos]
domain_part = email[at_pos + 1:]
# Find the last dot in domain
last_dot = find_last_dot(domain_part, 0, -1)
if last_dot == -1:
return False # No dot in domain
# Split domain into name and extension
domain_name = domain_part[:last_dot]
extension = domain_part[last_dot + 1:]
# Validate each part
return (validate_username(username, 0) and
validate_domain(domain_name, 0) and
validate_extension(extension, 0))
def find_char_position(s, char, start_index):
"""
Recursively find the position of a character starting from start_index.
Returns -1 if not found.
"""
if start_index >= len(s):
return -1 # Reached end, not found
if s[start_index] == char:
return start_index # Found it!
return find_char_position(s, char, start_index + 1)
def find_last_dot(s, current_index, last_found):
"""
Recursively find the position of the LAST dot in string.
Returns -1 if no dot found.
"""
if current_index >= len(s):
return last_found # Reached end, return last position found
if s[current_index] == '.':
# Update last_found and continue searching
return find_last_dot(s, current_index + 1, current_index)
return find_last_dot(s, current_index + 1, last_found)
def validate_username(s, index):
"""
Validate username part: alphanumeric or dots, but not starting/ending with dot.
"""
if len(s) == 0:
return False # Empty username
# Check first character isn't a dot
if index == 0 and s[0] == '.':
return False
# Check last character isn't a dot
if index == len(s) - 1 and s[index] == '.':
return False
# Base case: validated all characters
if index >= len(s):
return True
# Check current character
if not (s[index].isalnum() or s[index] == '.'):
return False
return validate_username(s, index + 1)
def validate_domain(s, index):
"""
Validate domain name: alphanumeric or dots, but not starting/ending with dot.
"""
if len(s) == 0:
return False # Empty domain
# Check first character isn't a dot
if index == 0 and s[0] == '.':
return False
# Check last character isn't a dot
if index == len(s) - 1 and s[index] == '.':
return False
# Base case: validated all characters
if index >= len(s):
return True
# Check current character
if not (s[index].isalnum() or s[index] == '.'):
return False
return validate_domain(s, index + 1)
def validate_extension(s, index):
"""
Validate extension: 2+ alphabetic characters only.
"""
if len(s) < 2:
return False # Too short
# Base case: validated all characters
if index >= len(s):
return True
# Check current character is alphabetic
if not s[index].isalpha():
return False
return validate_extension(s, index + 1)
# Test with previous edge cases
print("Testing Iteration 1:")
all_test_cases = [
"user@example.com",
"john.doe@company.co",
"user@sub.domain.com",
"user@example.co.uk",
".user@example.com",
"user.@example.com",
"user@.example.com",
"invalid",
"user@@example.com",
]
for email in all_test_cases:
result = is_valid_email_v2(email)
print(f"{email:25} β {result}")
Python Output:
Testing Iteration 1:
user@example.com β True
john.doe@company.co β True
user@sub.domain.com β True β Fixed!
user@example.co.uk β True β Fixed!
.user@example.com β False β Fixed!
user.@example.com β False β Fixed!
user@.example.com β False β Correct!
invalid β False
user@@example.com β False
What Changed
Before (Iteration 0): Character-by-character processing without understanding structure
After (Iteration 1): Two-phase approach:
1. Structure identification: Find @ and last . positions
2. Part validation: Validate each section independently
Call Stack Visualization
Let's trace is_valid_email_v2("user@example.com"):
Execution Trace:
is_valid_email_v2("user@example.com")
β find_char_position("user@example.com", '@', 0)
β s[0]='u' != '@', recurse with index=1
β s[1]='s' != '@', recurse with index=2
β s[2]='e' != '@', recurse with index=3
β s[3]='r' != '@', recurse with index=4
β s[4]='@' == '@', return 4
β at_pos = 4
β username = "user", domain_part = "example.com"
β find_last_dot("example.com", 0, -1)
β s[0]='e' != '.', recurse with index=1, last=-1
β s[1]='x' != '.', recurse with index=2, last=-1
β ... (skipping similar steps)
β s[7]='.' == '.', recurse with index=8, last=7
β s[8]='c' != '.', recurse with index=9, last=7
β s[9]='o' != '.', recurse with index=10, last=7
β s[10]='m' != '.', recurse with index=11, last=7
β index=11 >= len(11), return last=7
β last_dot = 7
β domain_name = "example", extension = "com"
β validate_username("user", 0)
β All characters pass validation
β True
β validate_domain("example", 0)
β All characters pass validation
β True
β validate_extension("com", 0)
β All characters pass validation
β True
β True (all parts valid)
Improvement Analysis
What it optimizes for: Correctness with complex structures (multiple dots) What it sacrifices: Simplicity (more helper functions, two-pass processing) When to choose this approach: When you need to understand structure before validation When to avoid this approach: When simple left-to-right processing suffices
Complexity characteristics: - Time complexity: O(n) - still linear, but with multiple passes - Space complexity: O(n) - call stack depth proportional to string length
Current Limitation
This works well, but we're making multiple recursive passes over the string. For "user@example.com":
1. First pass: Find @ position (scans 0-4)
2. Second pass: Check for second @ (scans 5-15)
3. Third pass: Find last dot (scans 0-10 of domain part)
4. Fourth pass: Validate username (scans 0-3)
5. Fifth pass: Validate domain (scans 0-6)
6. Sixth pass: Validate extension (scans 0-2)
Next challenge: Can we validate in a single pass while still handling complex structures?
Palindrome Checking: The Classic Pattern
Palindrome Checking: The Classic Pattern
Now that we understand structure-first validation, let's explore a different recursive string pattern: comparing characters from both ends simultaneously.
The Problem
A palindrome reads the same forwards and backwards:
- β "racecar"
- β "A man a plan a canal Panama" (ignoring spaces/case)
- β "hello"
This problem introduces a new recursive pattern: two-pointer recursion.
The Two-Pointer Pattern
Instead of processing first + rest, we process from both ends toward the middle:
def is_palindrome(s, left, right):
# Compare s[left] with s[right]
# Recurse with left+1, right-1
This is our first example of recursion that doesn't follow the "first + rest" pattern.
Implementation: Basic Palindrome Checker
def is_palindrome(s, left=0, right=None):
"""
Check if string is a palindrome using two-pointer recursion.
Args:
s: String to check
left: Left pointer (starts at 0)
right: Right pointer (starts at len(s)-1)
"""
# Initialize right pointer on first call
if right is None:
right = len(s) - 1
# Base case 1: Pointers crossed (even-length palindrome)
if left >= right:
return True
# Base case 2: Characters don't match
if s[left] != s[right]:
return False
# Recursive case: Characters match, check inner substring
return is_palindrome(s, left + 1, right - 1)
# Test cases
test_words = [
"racecar",
"hello",
"noon",
"a",
"",
"ab",
"aba",
"abba",
]
print("Basic Palindrome Checker:")
for word in test_words:
result = is_palindrome(word)
print(f"{word:15} β {result}")
Python Output:
Basic Palindrome Checker:
racecar β True
hello β False
noon β True
a β True
β True
ab β False
aba β True
abba β True
Call Stack Visualization
Let's trace is_palindrome("racecar", 0, 6):
Execution Trace:
is_palindrome("racecar", left=0, right=6)
β s[0]='r' == s[6]='r' β
β is_palindrome("racecar", left=1, right=5)
β s[1]='a' == s[5]='a' β
β is_palindrome("racecar", left=2, right=4)
β s[2]='c' == s[4]='c' β
β is_palindrome("racecar", left=3, right=3)
β left >= right (3 >= 3) β BASE CASE
β return True
β return True
β return True
β return True
Result: True
Compare with a non-palindrome is_palindrome("hello", 0, 4):
Execution Trace:
is_palindrome("hello", left=0, right=4)
β s[0]='h' != s[4]='o' β
β return False (immediate)
Result: False
Why This Pattern Works
Key insight: A string is a palindrome if and only if: 1. The first and last characters match, AND 2. The substring between them is also a palindrome
This is a perfect recursive decomposition:
- Problem: Is "racecar" a palindrome?
- Subproblem: Is "aceca" a palindrome? (after matching 'r' with 'r')
- Smaller subproblem: Is "cec" a palindrome? (after matching 'a' with 'a')
- Base case: Single character or empty string is always a palindrome
Complexity Analysis
Time Complexity: O(n/2) = O(n) - We compare n/2 pairs of characters - Each comparison is O(1)
Space Complexity: O(n/2) = O(n) - Call stack depth: n/2 recursive calls - Each call stores constant data (two integers)
Recurrence Relation: T(n) = T(n-2) + O(1) - Each call processes 2 characters (left and right) - Solves to O(n) by telescoping
Real-World Enhancement: Ignoring Non-Alphanumeric
Real palindrome checkers ignore spaces, punctuation, and case:
def is_palindrome_flexible(s, left=0, right=None):
"""
Check palindrome ignoring non-alphanumeric characters and case.
"""
# Normalize: convert to lowercase
s = s.lower()
if right is None:
right = len(s) - 1
# Base case: pointers crossed
if left >= right:
return True
# Skip non-alphanumeric from left
if not s[left].isalnum():
return is_palindrome_flexible(s, left + 1, right)
# Skip non-alphanumeric from right
if not s[right].isalnum():
return is_palindrome_flexible(s, left, right - 1)
# Both are alphanumeric, compare them
if s[left] != s[right]:
return False
return is_palindrome_flexible(s, left + 1, right - 1)
# Test with real phrases
phrases = [
"A man a plan a canal Panama",
"race a car",
"Was it a car or a cat I saw?",
"No 'x' in Nixon",
]
print("\nFlexible Palindrome Checker:")
for phrase in phrases:
result = is_palindrome_flexible(phrase)
print(f"{phrase:35} β {result}")
Python Output:
Flexible Palindrome Checker:
A man a plan a canal Panama β True
race a car β False
Was it a car or a cat I saw? β True
No 'x' in Nixon β True
Manual Trace: Complex Example
Let's trace is_palindrome_flexible("A man", 0, 5) by hand:
is_palindrome_flexible("a man", left=0, right=5)
β s[0]='a' is alnum β
β s[5]='n' is alnum β
β 'a' == 'n'? No β return False
Wait, that's wrong! Let's trace more carefully:
Original: "A man"
After lowercase: "a man"
Indices: 0='a', 1=' ', 2='m', 3='a', 4='n'
is_palindrome_flexible("a man", left=0, right=4)
β s[0]='a' is alnum β
β s[4]='n' is alnum β
β 'a' != 'n' β return False
Hmm, still wrong. The issue is we're not handling the space correctly.
Let me trace the ACTUAL execution:
is_palindrome_flexible("a man", left=0, right=4)
β s[0]='a' is alnum β
β s[4]='n' is alnum β
β 'a' != 'n' β return False
Waitβthis reveals a bug! "A man" should NOT be a palindrome, and our function correctly returns False. But "A man a plan a canal Panama" IS a palindrome. Let's trace that:
After lowercase: "a man a plan a canal panama"
Indices: 0='a', 1=' ', 2='m', 3='a', 4='n', 5=' ', 6='a', ...
is_palindrome_flexible("a man a plan a canal panama", 0, 26)
β s[0]='a' is alnum β
β s[26]='a' is alnum β
β 'a' == 'a' β
β recurse(1, 25)
is_palindrome_flexible("a man a plan a canal panama", 1, 25)
β s[1]=' ' is NOT alnum
β skip left: recurse(2, 25)
is_palindrome_flexible("a man a plan a canal panama", 2, 25)
β s[2]='m' is alnum β
β s[25]='m' is alnum β
β 'm' == 'm' β
β recurse(3, 24)
... (continues matching characters, skipping spaces)
The key insight: recursive skipping allows us to handle irregular patterns without complex preprocessing.
String Reversal: Building Results
String Reversal: Building Results
Palindrome checking tests a property. String reversal builds a new string. This introduces a new pattern: accumulating results during recursion.
The Problem
Reverse a string recursively:
- Input: "hello"
- Output: "olleh"
Pattern 1: First + Rest (Inefficient)
The naive approach processes first character, recurses on rest, then concatenates:
def reverse_string_naive(s):
"""
Reverse string using first + rest pattern.
WARNING: This is inefficient!
"""
# Base case: empty or single character
if len(s) <= 1:
return s
# Recursive case: reverse rest, append first
first = s[0]
rest = s[1:]
return reverse_string_naive(rest) + first
# Test
test_strings = ["hello", "a", "", "Python", "racecar"]
print("Naive String Reversal:")
for s in test_strings:
result = reverse_string_naive(s)
print(f"{s:10} β {result}")
Python Output:
Naive String Reversal:
hello β olleh
a β a
β
Python β nohtyP
racecar β racecar
Call Stack Visualization
Let's trace reverse_string_naive("hello"):
Execution Trace:
reverse_string_naive("hello")
β first='h', rest="ello"
β reverse_string_naive("ello")
β first='e', rest="llo"
β reverse_string_naive("llo")
β first='l', rest="lo"
β reverse_string_naive("lo")
β first='l', rest="o"
β reverse_string_naive("o")
β len("o") == 1 β BASE CASE
β return "o"
β return "o" + "l" = "ol"
β return "ol" + "l" = "oll"
β return "oll" + "e" = "olle"
β return "olle" + "h" = "olleh"
Result: "olleh"
Why This Is Inefficient
Problem: String concatenation in Python creates a new string object each time.
For "hello" (length 5):
- Level 4: "o" (1 character)
- Level 3: "o" + "l" creates new string (2 characters)
- Level 2: "ol" + "l" creates new string (3 characters)
- Level 1: "oll" + "e" creates new string (4 characters)
- Level 0: "olle" + "h" creates new string (5 characters)
Total characters copied: 1 + 2 + 3 + 4 + 5 = 15 characters for a 5-character string!
Time Complexity: O(nΒ²) - n recursive calls - Each call does O(n) string concatenation - Total: O(nΒ²)
Space Complexity: O(nΒ²) - Call stack: O(n) - Intermediate strings: O(nΒ²) total characters
This is terrible! Let's fix it.
Pattern 2: Accumulator Pattern (Efficient)
Instead of building the result on the way back up the call stack, we accumulate it on the way down:
def reverse_string_accumulator(s, accumulator=""):
"""
Reverse string using accumulator pattern.
Much more efficient!
"""
# Base case: processed all characters
if len(s) == 0:
return accumulator
# Recursive case: prepend first character to accumulator
first = s[0]
rest = s[1:]
return reverse_string_accumulator(rest, first + accumulator)
# Test
print("\nAccumulator String Reversal:")
for s in test_strings:
result = reverse_string_accumulator(s)
print(f"{s:10} β {result}")
Python Output:
Accumulator String Reversal:
hello β olleh
a β a
β
Python β nohtyP
racecar β racecar
Call Stack Visualization: Accumulator Pattern
Let's trace reverse_string_accumulator("hello", ""):
Execution Trace:
reverse_string_accumulator("hello", accumulator="")
β first='h', rest="ello"
β reverse_string_accumulator("ello", accumulator="h")
β first='e', rest="llo"
β reverse_string_accumulator("llo", accumulator="eh")
β first='l', rest="lo"
β reverse_string_accumulator("lo", accumulator="leh")
β first='l', rest="o"
β reverse_string_accumulator("o", accumulator="lleh")
β first='o', rest=""
β reverse_string_accumulator("", accumulator="olleh")
β len("") == 0 β BASE CASE
β return "olleh"
β return "olleh"
β return "olleh"
β return "olleh"
β return "olleh"
β return "olleh"
Result: "olleh"
Key Difference: When the Work Happens
Naive approach: Work happens on the return path (going back up)
Call: "hello" β "ello" β "llo" β "lo" β "o"
Return: "olleh" β "olle" β "oll" β "ol" β "o"
(concatenate on each return)
Accumulator approach: Work happens on the call path (going down)
Call: ("hello", "") β ("ello", "h") β ("llo", "eh") β ("lo", "leh") β ("o", "lleh") β ("", "olleh")
Return: "olleh" β "olleh" β "olleh" β "olleh" β "olleh" β "olleh"
(just pass the result back up)
Complexity Comparison
Naive Approach: - Time: O(nΒ²) - string concatenation at each level - Space: O(nΒ²) - intermediate strings
Accumulator Approach: - Time: O(nΒ²) - still doing string concatenation, but... - Space: O(n) - only one string being built
Wait, the time complexity is still O(nΒ²)! That's because Python strings are immutable, so first + accumulator still creates a new string.
Pattern 3: List Accumulator (Truly Efficient)
To achieve O(n) time, we need to avoid string concatenation entirely:
def reverse_string_optimal(s, index=0, result=None):
"""
Reverse string using list accumulator for O(n) time.
"""
# Initialize result list on first call
if result is None:
result = []
# Base case: processed all characters
if index >= len(s):
return ''.join(result)
# Recursive case: prepend current character
result.insert(0, s[index])
return reverse_string_optimal(s, index + 1, result)
# Even better: build list in reverse order, no insert needed
def reverse_string_best(s, index=None, result=None):
"""
Reverse string by building list backwards.
True O(n) time and space.
"""
if index is None:
index = len(s) - 1
if result is None:
result = []
# Base case: processed all characters
if index < 0:
return ''.join(result)
# Recursive case: append current character (from end)
result.append(s[index])
return reverse_string_best(s, index - 1, result)
# Test both
print("\nOptimal String Reversal:")
for s in test_strings:
result = reverse_string_best(s)
print(f"{s:10} β {result}")
Python Output:
Optimal String Reversal:
hello β olleh
a β a
β
Python β nohtyP
racecar β racecar
Complexity Analysis: Final Version
Time Complexity: O(n) - Single pass through string - List append is O(1) amortized - Final join is O(n) - Total: O(n)
Space Complexity: O(n) - Call stack: O(n) - Result list: O(n) - Total: O(n)
When to Apply Each Solution
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Naive | O(nΒ²) | O(nΒ²) | Never (educational only) |
| Accumulator | O(nΒ²) | O(n) | When simplicity > performance, small strings |
| List-based | O(n) | O(n) | Production code, large strings |
| Python slicing | O(n) | O(n) | When you can use s[::-1] (non-recursive) |
Honest Discussion: When NOT to Use Recursion
For string reversal, the iterative approach is clearer and more efficient:
def reverse_iterative(s):
return s[::-1] # Or: ''.join(reversed(s))
Why recursion for reversal is educational but not practical: - Python has built-in string reversal - Recursion adds call stack overhead - No clarity benefit over iteration for this problem
When recursion DOES make sense for strings: - Parsing nested structures (parentheses, HTML tags) - Pattern matching with backtracking - Tree-like string structures (expression evaluation)
We'll see these cases in the next section.
Pattern Matching: Recursive Search
Pattern Matching: Recursive Search
Now we tackle a problem where recursion genuinely shines: finding patterns in strings with complex rules.
The Problem: Wildcard Matching
Implement a pattern matcher that supports:
- * matches zero or more characters
- ? matches exactly one character
- All other characters match themselves
Examples:
- match("hello", "h*o") β True (h + anything + o)
- match("hello", "h?llo") β True (h + one char + llo)
- match("hello", "h*l*o") β True (h + anything + l + anything + o)
- match("hello", "h?o") β False (h + one char + o doesn't match)
This is a simplified version of shell glob patterns or regex matching.
Why Recursion Fits
Pattern matching requires exploring multiple possibilities:
- When we see *, it could match 0 characters, 1 character, 2 characters, etc.
- We need to try each possibility until one works
- This is naturally recursive: try one option, recurse on the rest
Implementation: Wildcard Matcher
def wildcard_match(text, pattern, t_idx=0, p_idx=0):
"""
Match text against pattern with wildcards.
Args:
text: String to match
pattern: Pattern with * and ? wildcards
t_idx: Current position in text
p_idx: Current position in pattern
Returns:
True if text matches pattern, False otherwise
"""
# Base case 1: Both exhausted - success!
if t_idx == len(text) and p_idx == len(pattern):
return True
# Base case 2: Pattern exhausted but text remains - failure
if p_idx == len(pattern):
return False
# Base case 3: Text exhausted but pattern remains
# Only succeeds if remaining pattern is all stars
if t_idx == len(text):
return all(c == '*' for c in pattern[p_idx:])
# Get current characters
p_char = pattern[p_idx]
t_char = text[t_idx]
# Case 1: Exact match or ? wildcard
if p_char == t_char or p_char == '?':
# Characters match, advance both pointers
return wildcard_match(text, pattern, t_idx + 1, p_idx + 1)
# Case 2: * wildcard - try multiple possibilities
if p_char == '*':
# Option A: * matches zero characters (skip the star)
if wildcard_match(text, pattern, t_idx, p_idx + 1):
return True
# Option B: * matches one or more characters
# Try matching current text character and continue with *
if wildcard_match(text, pattern, t_idx + 1, p_idx):
return True
# Both options failed
return False
# Case 3: Characters don't match and no wildcard
return False
# Test cases
test_cases = [
("hello", "hello"), # Exact match
("hello", "h*o"), # * matches "ell"
("hello", "h?llo"), # ? matches "e"
("hello", "h*l*o"), # Multiple *
("hello", "h?o"), # ? can't match "ell"
("hello", "*"), # * matches everything
("hello", "h*"), # * matches "ello"
("hello", "*o"), # * matches "hell"
("", "*"), # * matches empty
("", "?"), # ? can't match empty
("abc", "a*c"), # * matches "b"
("abc", "a*b*c"), # Multiple * with matches
]
print("Wildcard Pattern Matching:")
for text, pattern in test_cases:
result = wildcard_match(text, pattern)
print(f"match('{text:6}', '{pattern:10}') β {result}")
Python Output:
Wildcard Pattern Matching:
match('hello ', 'hello ') β True
match('hello ', 'h*o ') β True
match('hello ', 'h?llo ') β True
match('hello ', 'h*l*o ') β True
match('hello ', 'h?o ') β False
match('hello ', '* ') β True
match('hello ', 'h* ') β True
match('hello ', '*o ') β True
match(' ', '* ') β True
match(' ', '? ') β False
match('abc ', 'a*c ') β True
match('abc ', 'a*b*c ') β True
Call Stack Visualization: Complex Case
Let's trace wildcard_match("hello", "h*o", 0, 0) to see the branching:
Execution Trace:
wildcard_match("hello", "h*o", t_idx=0, p_idx=0)
β text[0]='h', pattern[0]='h' β exact match
β wildcard_match("hello", "h*o", t_idx=1, p_idx=1)
β text[1]='e', pattern[1]='*' β star wildcard!
β Try Option A: * matches zero characters
β wildcard_match("hello", "h*o", t_idx=1, p_idx=2)
β text[1]='e', pattern[2]='o' β no match
β return False
β Try Option B: * matches one or more characters
β wildcard_match("hello", "h*o", t_idx=2, p_idx=1)
β text[2]='l', pattern[1]='*' β star again!
β Try Option A: * matches zero characters
β wildcard_match("hello", "h*o", t_idx=2, p_idx=2)
β text[2]='l', pattern[2]='o' β no match
β return False
β Try Option B: * matches one or more characters
β wildcard_match("hello", "h*o", t_idx=3, p_idx=1)
β text[3]='l', pattern[1]='*' β star again!
β Try Option A: * matches zero characters
β wildcard_match("hello", "h*o", t_idx=3, p_idx=2)
β text[3]='l', pattern[2]='o' β no match
β return False
β Try Option B: * matches one or more characters
β wildcard_match("hello", "h*o", t_idx=4, p_idx=1)
β text[4]='o', pattern[1]='*' β star again!
β Try Option A: * matches zero characters
β wildcard_match("hello", "h*o", t_idx=4, p_idx=2)
β text[4]='o', pattern[2]='o' β MATCH!
β wildcard_match("hello", "h*o", t_idx=5, p_idx=3)
β t_idx=5 == len(text), p_idx=3 == len(pattern)
β return True β SUCCESS!
β return True
β return True
β return True
β return True
β return True
β return True
Result: True
Recursion Tree: Visualizing Backtracking
The * wildcard creates a branching recursion tree:
match("hello", "h*o", 0, 0)
|
match("hello", "h*o", 1, 1) [h matched]
|
pattern[1] = '*' β BRANCH
/ \
Option A: skip * Option B: consume char
match(..., 1, 2) match(..., 2, 1)
text[1]='e' vs 'o' pattern[1]='*' β BRANCH
β FAIL / \
skip * consume char
match(..., 2, 2) match(..., 3, 1)
'l' vs 'o' pattern[1]='*' β BRANCH
β FAIL / \
skip * consume char
match(..., 3, 2) match(..., 4, 1)
'l' vs 'o' pattern[1]='*' β BRANCH
β FAIL / \
skip * consume char
match(..., 4, 2) match(..., 5, 1)
'o' vs 'o' (text exhausted)
β SUCCESS! β FAIL
Why This Is Elegant Recursion
The iterative equivalent would require: - Explicit stack to track positions - Manual backtracking logic - State management for "which option are we trying?"
The recursive version: - Naturally explores all possibilities - Backtracking happens automatically via call stack - Each recursive call is simple and focused
Complexity Analysis
Time Complexity: O(2^n) worst case
- Each * can branch into two recursive calls
- With k stars, we could have 2^k branches
- Example: pattern "a*b*c*" on text "abc" explores many paths
Space Complexity: O(n + m) - Call stack depth: O(n + m) where n = len(text), m = len(pattern) - Each call stores two integers (indices)
When this is acceptable: - Short patterns and texts (typical for shell globs) - Patterns with few wildcards - When correctness > performance
When to optimize: - Long texts or patterns - Many wildcards - Performance-critical code - β Use dynamic programming (memoization) - covered in Module 6
Common Failure Modes and Their Signatures
Symptom: Infinite recursion with * patterns
Python output pattern:
RecursionError: maximum recursion depth exceeded
Diagnostic clues:
- Pattern contains *
- Same indices appear repeatedly in traceback
- No progress being made (indices not changing)
Root cause: Forgetting to advance at least one pointer in each recursive call
Solution: Ensure every recursive call changes at least one index
Symptom: Incorrect matches with multiple *
Python output pattern:
match('abc', 'a*b*c') β False (should be True)
Diagnostic clues:
- Pattern has multiple *
- Simple cases work, complex cases fail
- Backtracking not happening
Root cause: Not trying all possibilities for * (only trying one option)
Solution: Implement both "skip star" and "consume character" branches
Project: Recursive String Validator
Project: Recursive String Validator
Now we'll combine everything we've learned into a practical project: a comprehensive string validator that can validate multiple formats using recursive patterns.
Project Specification
Build a validator that can check: 1. Email addresses (improved from our earlier version) 2. Phone numbers (various formats) 3. URLs (basic validation) 4. Credit card numbers (with Luhn algorithm)
Each validator should: - Use recursive string processing - Handle edge cases gracefully - Provide clear error messages - Be composable (validators can call other validators)
Architecture: Validator Framework
First, let's build a framework for validators:
class ValidationResult:
"""
Represents the result of a validation check.
"""
def __init__(self, is_valid, message=""):
self.is_valid = is_valid
self.message = message
def __bool__(self):
return self.is_valid
def __str__(self):
status = "β" if self.is_valid else "β"
return f"{status} {self.message}" if self.message else status
class StringValidator:
"""
Base class for recursive string validators.
"""
def validate(self, s):
"""
Validate string. Must be implemented by subclasses.
Returns ValidationResult.
"""
raise NotImplementedError
def __call__(self, s):
"""Allow validator to be called like a function."""
return self.validate(s)
# Test the framework
result = ValidationResult(True, "Valid email format")
print(f"Result: {result}")
print(f"Boolean: {bool(result)}")
Python Output:
Result: β Valid email format
Boolean: True
Validator 1: Enhanced Email Validator
class EmailValidator(StringValidator):
"""
Validate email addresses recursively.
Format: username@domain.extension
"""
def validate(self, email):
"""Main validation entry point."""
if not email:
return ValidationResult(False, "Email cannot be empty")
# Find @ symbol
at_pos = self._find_char(email, '@', 0)
if at_pos == -1:
return ValidationResult(False, "Missing @ symbol")
# Check for multiple @
if self._find_char(email, '@', at_pos + 1) != -1:
return ValidationResult(False, "Multiple @ symbols")
# Split and validate parts
username = email[:at_pos]
domain_part = email[at_pos + 1:]
# Validate username
if not self._validate_username(username, 0):
return ValidationResult(False, "Invalid username format")
# Find last dot in domain
last_dot = self._find_last_char(domain_part, '.', 0, -1)
if last_dot == -1:
return ValidationResult(False, "Missing domain extension")
domain = domain_part[:last_dot]
extension = domain_part[last_dot + 1:]
# Validate domain
if not self._validate_domain(domain, 0):
return ValidationResult(False, "Invalid domain format")
# Validate extension
if not self._validate_extension(extension, 0):
return ValidationResult(False, "Invalid extension format")
return ValidationResult(True, "Valid email format")
def _find_char(self, s, char, start):
"""Recursively find character position."""
if start >= len(s):
return -1
if s[start] == char:
return start
return self._find_char(s, char, start + 1)
def _find_last_char(self, s, char, current, last_found):
"""Recursively find last occurrence of character."""
if current >= len(s):
return last_found
if s[current] == char:
return self._find_last_char(s, char, current + 1, current)
return self._find_last_char(s, char, current + 1, last_found)
def _validate_username(self, s, idx):
"""Validate username part recursively."""
if len(s) == 0:
return False
if idx == 0 and s[0] == '.':
return False # Can't start with dot
if idx == len(s) - 1 and s[idx] == '.':
return False # Can't end with dot
if idx >= len(s):
return True
if not (s[idx].isalnum() or s[idx] == '.' or s[idx] == '_'):
return False
return self._validate_username(s, idx + 1)
def _validate_domain(self, s, idx):
"""Validate domain part recursively."""
if len(s) == 0:
return False
if idx == 0 and s[0] == '.':
return False
if idx == len(s) - 1 and s[idx] == '.':
return False
if idx >= len(s):
return True
if not (s[idx].isalnum() or s[idx] == '.' or s[idx] == '-'):
return False
return self._validate_domain(s, idx + 1)
def _validate_extension(self, s, idx):
"""Validate extension part recursively."""
if len(s) < 2:
return False
if idx >= len(s):
return True
if not s[idx].isalpha():
return False
return self._validate_extension(s, idx + 1)
# Test email validator
email_validator = EmailValidator()
email_tests = [
"user@example.com",
"john.doe@company.co.uk",
"invalid",
"user@@example.com",
".user@example.com",
"user@example.",
"user_name@sub-domain.example.com",
]
print("\nEmail Validation:")
for email in email_tests:
result = email_validator.validate(email)
print(f"{email:35} β {result}")
Python Output:
Email Validation:
user@example.com β β Valid email format
john.doe@company.co.uk β β Valid email format
invalid β β Missing @ symbol
user@@example.com β β Multiple @ symbols
.user@example.com β β Invalid username format
user@example. β β Invalid extension format
user_name@sub-domain.example.com β β Valid email format
Validator 2: Phone Number Validator
class PhoneValidator(StringValidator):
"""
Validate phone numbers recursively.
Supports formats:
- (123) 456-7890
- 123-456-7890
- 1234567890
- +1 (123) 456-7890
"""
def validate(self, phone):
"""Main validation entry point."""
if not phone:
return ValidationResult(False, "Phone number cannot be empty")
# Remove spaces for easier processing
cleaned = self._remove_spaces(phone, 0, "")
# Check for international prefix
if cleaned[0] == '+':
return self._validate_international(cleaned, 1)
# Check for area code in parentheses
if cleaned[0] == '(':
return self._validate_with_parens(cleaned, 0)
# Check for standard format with dashes
if '-' in cleaned:
return self._validate_with_dashes(cleaned, 0)
# Check for plain digits
return self._validate_plain_digits(cleaned, 0)
def _remove_spaces(self, s, idx, result):
"""Recursively remove spaces from string."""
if idx >= len(s):
return result
if s[idx] == ' ':
return self._remove_spaces(s, idx + 1, result)
return self._remove_spaces(s, idx + 1, result + s[idx])
def _validate_international(self, s, idx):
"""Validate international format: +1(123)456-7890"""
# After +, expect country code (1-3 digits)
country_code_end = self._find_non_digit(s, idx)
if country_code_end == -1:
country_code_end = len(s)
country_code_len = country_code_end - idx
if country_code_len < 1 or country_code_len > 3:
return ValidationResult(False, "Invalid country code")
# Validate rest as domestic number
rest = s[country_code_end:]
if rest[0] == '(':
return self._validate_with_parens(rest, 0)
return self._validate_with_dashes(rest, 0)
def _validate_with_parens(self, s, idx):
"""Validate format: (123)456-7890"""
if s[idx] != '(':
return ValidationResult(False, "Expected opening parenthesis")
# Find closing paren
close_paren = self._find_char(s, ')', idx + 1)
if close_paren == -1:
return ValidationResult(False, "Missing closing parenthesis")
# Validate area code (3 digits)
area_code = s[idx + 1:close_paren]
if not self._all_digits(area_code, 0) or len(area_code) != 3:
return ValidationResult(False, "Invalid area code")
# Validate rest
rest = s[close_paren + 1:]
return self._validate_with_dashes(rest, 0)
def _validate_with_dashes(self, s, idx):
"""Validate format: 123-456-7890 or 456-7890"""
parts = self._split_by_dash(s, 0, "", [])
if len(parts) == 2:
# Format: 456-7890
if len(parts[0]) != 3 or len(parts[1]) != 4:
return ValidationResult(False, "Invalid phone format")
elif len(parts) == 3:
# Format: 123-456-7890
if len(parts[0]) != 3 or len(parts[1]) != 3 or len(parts[2]) != 4:
return ValidationResult(False, "Invalid phone format")
else:
return ValidationResult(False, "Invalid phone format")
# Check all parts are digits
for part in parts:
if not self._all_digits(part, 0):
return ValidationResult(False, "Phone number must contain only digits")
return ValidationResult(True, "Valid phone format")
def _validate_plain_digits(self, s, idx):
"""Validate format: 1234567890"""
if len(s) != 10:
return ValidationResult(False, "Phone number must be 10 digits")
if not self._all_digits(s, 0):
return ValidationResult(False, "Phone number must contain only digits")
return ValidationResult(True, "Valid phone format")
def _find_char(self, s, char, start):
"""Recursively find character."""
if start >= len(s):
return -1
if s[start] == char:
return start
return self._find_char(s, char, start + 1)
def _find_non_digit(self, s, start):
"""Recursively find first non-digit."""
if start >= len(s):
return -1
if not s[start].isdigit():
return start
return self._find_non_digit(s, start + 1)
def _all_digits(self, s, idx):
"""Recursively check if all characters are digits."""
if idx >= len(s):
return True
if not s[idx].isdigit():
return False
return self._all_digits(s, idx + 1)
def _split_by_dash(self, s, idx, current, parts):
"""Recursively split string by dashes."""
if idx >= len(s):
if current:
parts.append(current)
return parts
if s[idx] == '-':
if current:
parts.append(current)
return self._split_by_dash(s, idx + 1, "", parts)
return self._split_by_dash(s, idx + 1, current + s[idx], parts)
# Test phone validator
phone_validator = PhoneValidator()
phone_tests = [
"(123) 456-7890",
"123-456-7890",
"1234567890",
"+1 (123) 456-7890",
"456-7890",
"123-45-6789", # Wrong format
"12345", # Too short
"abc-def-ghij", # Not digits
]
print("\nPhone Validation:")
for phone in phone_tests:
result = phone_validator.validate(phone)
print(f"{phone:25} β {result}")
Python Output:
Phone Validation:
(123) 456-7890 β β Valid phone format
123-456-7890 β β Valid phone format
1234567890 β β Valid phone format
+1 (123) 456-7890 β β Valid phone format
456-7890 β β Valid phone format
123-45-6789 β β Invalid phone format
12345 β β Phone number must be 10 digits
abc-def-ghij β β Phone number must contain only digits
Validator 3: URL Validator
class URLValidator(StringValidator):
"""
Validate URLs recursively.
Format: protocol://domain.extension/path?query#fragment
"""
def validate(self, url):
"""Main validation entry point."""
if not url:
return ValidationResult(False, "URL cannot be empty")
# Check for protocol
protocol_end = self._find_substring(url, "://", 0)
if protocol_end == -1:
return ValidationResult(False, "Missing protocol (http:// or https://)")
protocol = url[:protocol_end]
if not self._validate_protocol(protocol, 0):
return ValidationResult(False, "Invalid protocol")
# Extract domain part (after protocol, before path/query/fragment)
rest = url[protocol_end + 3:]
domain_end = self._find_first_of(rest, ['/', '?', '#'], 0)
if domain_end == -1:
domain_end = len(rest)
domain = rest[:domain_end]
if not self._validate_domain(domain, 0):
return ValidationResult(False, "Invalid domain")
# Path/query/fragment are optional, so we're done
return ValidationResult(True, "Valid URL format")
def _find_substring(self, s, substr, start):
"""Recursively find substring position."""
if start + len(substr) > len(s):
return -1
if s[start:start + len(substr)] == substr:
return start
return self._find_substring(s, substr, start + 1)
def _find_first_of(self, s, chars, idx):
"""Recursively find first occurrence of any character in chars."""
if idx >= len(s):
return -1
if s[idx] in chars:
return idx
return self._find_first_of(s, chars, idx + 1)
def _validate_protocol(self, s, idx):
"""Validate protocol (http or https)."""
valid_protocols = ['http', 'https', 'ftp', 'ftps']
return s in valid_protocols
def _validate_domain(self, s, idx):
"""Validate domain part."""
if len(s) == 0:
return False
# Domain should have at least one dot
if '.' not in s:
return False
# Check characters are valid
if idx >= len(s):
return True
if not (s[idx].isalnum() or s[idx] in ['.', '-', '_']):
return False
return self._validate_domain(s, idx + 1)
# Test URL validator
url_validator = URLValidator()
url_tests = [
"http://example.com",
"https://www.example.com",
"https://sub.domain.example.com/path",
"ftp://files.example.com",
"example.com", # Missing protocol
"http://", # Missing domain
"http://invalid domain.com", # Space in domain
]
print("\nURL Validation:")
for url in url_tests:
result = url_validator.validate(url)
print(f"{url:40} β {result}")
Python Output:
URL Validation:
http://example.com β β Valid URL format
https://www.example.com β β Valid URL format
https://sub.domain.example.com/path β β Valid URL format
ftp://files.example.com β β Valid URL format
example.com β β Missing protocol (http:// or https://)
http:// β β Invalid domain
http://invalid domain.com β β Invalid domain
Validator 4: Credit Card Validator (Luhn Algorithm)
class CreditCardValidator(StringValidator):
"""
Validate credit card numbers using Luhn algorithm recursively.
"""
def validate(self, card_number):
"""Main validation entry point."""
if not card_number:
return ValidationResult(False, "Card number cannot be empty")
# Remove spaces and dashes
cleaned = self._remove_non_digits(card_number, 0, "")
# Check length (13-19 digits for most cards)
if len(cleaned) < 13 or len(cleaned) > 19:
return ValidationResult(False, "Invalid card number length")
# Check all digits
if not self._all_digits(cleaned, 0):
return ValidationResult(False, "Card number must contain only digits")
# Apply Luhn algorithm
if not self._luhn_check(cleaned):
return ValidationResult(False, "Invalid card number (failed Luhn check)")
return ValidationResult(True, "Valid card number")
def _remove_non_digits(self, s, idx, result):
"""Recursively remove non-digit characters."""
if idx >= len(s):
return result
if s[idx].isdigit():
return self._remove_non_digits(s, idx + 1, result + s[idx])
return self._remove_non_digits(s, idx + 1, result)
def _all_digits(self, s, idx):
"""Recursively check if all characters are digits."""
if idx >= len(s):
return True
if not s[idx].isdigit():
return False
return self._all_digits(s, idx + 1)
def _luhn_check(self, card_number):
"""
Implement Luhn algorithm recursively.
Algorithm:
1. Starting from rightmost digit, double every second digit
2. If doubled digit > 9, subtract 9
3. Sum all digits
4. If sum % 10 == 0, valid
"""
# Convert to list of integers
digits = [int(d) for d in card_number]
# Process from right to left
total = self._luhn_sum(digits, len(digits) - 1, False, 0)
return total % 10 == 0
def _luhn_sum(self, digits, idx, should_double, current_sum):
"""
Recursively calculate Luhn sum.
Args:
digits: List of digit integers
idx: Current index (processing right to left)
should_double: Whether to double current digit
current_sum: Accumulated sum
"""
# Base case: processed all digits
if idx < 0:
return current_sum
digit = digits[idx]
if should_double:
digit *= 2
if digit > 9:
digit -= 9
# Recurse with next digit (moving left)
return self._luhn_sum(digits, idx - 1, not should_double, current_sum + digit)
# Test credit card validator
card_validator = CreditCardValidator()
card_tests = [
"4532015112830366", # Valid Visa
"6011514433546201", # Valid Discover
"378282246310005", # Valid Amex
"1234567890123456", # Invalid (fails Luhn)
"4532-0151-1283-0366", # Valid with dashes
"4532 0151 1283 0366", # Valid with spaces
"123", # Too short
]
print("\nCredit Card Validation:")
for card in card_tests:
result = card_validator.validate(card)
print(f"{card:25} β {result}")
Python Output:
Credit Card Validation:
4532015112830366 β β Valid card number
6011514433546201 β β Valid card number
378282246310005 β β Valid card number
1234567890123456 β β Invalid card number (failed Luhn check)
4532-0151-1283-0366 β β Valid card number
4532 0151 1283 0366 β β Valid card number
123 β β Invalid card number length
Manual Trace: Luhn Algorithm
Let's trace the Luhn check for "1234" by hand:
_luhn_check("1234")
β digits = [1, 2, 3, 4]
β _luhn_sum([1,2,3,4], idx=3, should_double=False, sum=0)
_luhn_sum([1,2,3,4], idx=3, should_double=False, sum=0)
β digit = 4
β should_double = False, so digit stays 4
β recurse: _luhn_sum([1,2,3,4], idx=2, should_double=True, sum=4)
_luhn_sum([1,2,3,4], idx=2, should_double=True, sum=4)
β digit = 3
β should_double = True, so digit = 3 * 2 = 6
β 6 <= 9, so digit stays 6
β recurse: _luhn_sum([1,2,3,4], idx=1, should_double=False, sum=10)
_luhn_sum([1,2,3,4], idx=1, should_double=False, sum=10)
β digit = 2
β should_double = False, so digit stays 2
β recurse: _luhn_sum([1,2,3,4], idx=0, should_double=True, sum=12)
_luhn_sum([1,2,3,4], idx=0, should_double=True, sum=12)
β digit = 1
β should_double = True, so digit = 1 * 2 = 2
β 2 <= 9, so digit stays 2
β recurse: _luhn_sum([1,2,3,4], idx=-1, should_double=False, sum=14)
_luhn_sum([1,2,3,4], idx=-1, should_double=False, sum=14)
β idx < 0 β BASE CASE
β return 14
Back in _luhn_check:
β total = 14
β 14 % 10 = 4 (not 0)
β return False
So "1234" is not a valid card number according to Luhn.
Composable Validators: Putting It All Together
class CompositeValidator:
"""
Validate multiple formats using composition.
"""
def __init__(self):
self.validators = {
'email': EmailValidator(),
'phone': PhoneValidator(),
'url': URLValidator(),
'card': CreditCardValidator(),
}
def validate(self, input_type, value):
"""
Validate value based on input type.
Args:
input_type: One of 'email', 'phone', 'url', 'card'
value: String to validate
"""
if input_type not in self.validators:
return ValidationResult(False, f"Unknown type: {input_type}")
return self.validators[input_type].validate(value)
def validate_form(self, form_data):
"""
Validate entire form recursively.
Args:
form_data: Dict mapping field names to (type, value) tuples
Returns:
Dict mapping field names to ValidationResult
"""
return self._validate_fields(list(form_data.items()), 0, {})
def _validate_fields(self, fields, idx, results):
"""Recursively validate form fields."""
if idx >= len(fields):
return results
field_name, (field_type, field_value) = fields[idx]
result = self.validate(field_type, field_value)
results[field_name] = result
return self._validate_fields(fields, idx + 1, results)
# Test composite validator
composite = CompositeValidator()
# Validate individual values
print("\nComposite Validator - Individual:")
print(f"Email: {composite.validate('email', 'user@example.com')}")
print(f"Phone: {composite.validate('phone', '(123) 456-7890')}")
print(f"URL: {composite.validate('url', 'https://example.com')}")
print(f"Card: {composite.validate('card', '4532015112830366')}")
# Validate entire form
form_data = {
'email': ('email', 'john.doe@company.com'),
'phone': ('phone', '123-456-7890'),
'website': ('url', 'https://john-doe.com'),
'payment': ('card', '4532-0151-1283-0366'),
}
print("\nComposite Validator - Form:")
results = composite.validate_form(form_data)
for field, result in results.items():
print(f"{field:10} β {result}")
Python Output:
Composite Validator - Individual:
Email: β Valid email format
Phone: β Valid phone format
URL: β Valid URL format
Card: β Valid card number
Composite Validator - Form:
email β β Valid email format
phone β β Valid phone format
website β β Valid URL format
payment β β Valid card number
Project Summary: What We Built
We created a production-quality validation framework using recursive string processing:
Key Techniques Applied: 1. Structure-first validation (email, URL) 2. Character-by-character processing (all validators) 3. Recursive string transformation (removing spaces, splitting) 4. Recursive algorithm implementation (Luhn check) 5. Composition and reusability (CompositeValidator)
Complexity Analysis:
| Validator | Time Complexity | Space Complexity | Call Stack Depth |
|---|---|---|---|
| O(n) | O(n) | O(n) | |
| Phone | O(n) | O(n) | O(n) |
| URL | O(n) | O(n) | O(n) |
| Credit Card | O(n) | O(n) | O(n) |
All validators are linear time and space, making them suitable for production use.
When Recursion Shines vs. When It Doesn't
Recursion is excellent for: - β Nested structure validation (parentheses, HTML tags) - β Pattern matching with backtracking (wildcards, regex) - β Composable validation logic - β Clear expression of recursive rules
Recursion is overkill for:
- β Simple character-by-character checks (use loops)
- β String reversal (use slicing: s[::-1])
- β Finding substrings (use str.find())
- β Splitting strings (use str.split())
The key question: Does the problem have recursive structure (nested, branching, or self-similar), or is it just sequential processing?
For our validators: - Email/phone/URL: Sequential processing β recursion is elegant but not necessary - Wildcard matching: Branching exploration β recursion is the natural fit - Luhn algorithm: Recursive definition β recursion expresses the algorithm clearly
Extension Challenges
Try extending the project:
- Add password validator with recursive strength checking
- Implement regex-like pattern matching with more wildcards (
+,*,[]) - Add nested structure validation (balanced parentheses, HTML tags)
- Optimize with memoization for repeated validation calls
- Add detailed error reporting showing exactly where validation failed
Chapter Summary and Key Takeaways
Chapter Summary and Key Takeaways
The Journey: From Characters to Validators
We started with the fundamental insight that strings are recursive structuresβsequences that can be decomposed into first + rest, just like lists. Through progressive refinement, we built increasingly sophisticated string processing techniques:
| Iteration | Problem | Technique Introduced | Key Insight |
|---|---|---|---|
| 0 | Email validation | Character-by-character | Naive recursion fails with complex structures |
| 1 | Multiple dots in domain | Structure-first validation | Identify boundaries before processing |
| 2 | Palindrome checking | Two-pointer recursion | Process from both ends simultaneously |
| 3 | String reversal | Accumulator pattern | Build results during recursion, not after |
| 4 | Wildcard matching | Backtracking recursion | Explore multiple possibilities naturally |
| 5 | Production validators | Composition and reusability | Combine recursive patterns into frameworks |
Core Patterns Mastered
1. First + Rest Pattern
def process_string(s):
if len(s) == 0:
return base_case
first = s[0]
rest = s[1:]
return combine(first, process_string(rest))
When to use: Sequential processing, building/transforming strings
2. Two-Pointer Pattern
def process_string(s, left=0, right=None):
if right is None:
right = len(s) - 1
if left >= right:
return base_case
# Process s[left] and s[right]
return process_string(s, left + 1, right - 1)
When to use: Palindromes, symmetric validation, comparing ends
3. Accumulator Pattern
def process_string(s, index=0, accumulator=""):
if index >= len(s):
return accumulator
# Update accumulator
return process_string(s, index + 1, new_accumulator)
When to use: Building results, avoiding repeated concatenation
4. Backtracking Pattern
def match_pattern(text, pattern, t_idx=0, p_idx=0):
# Base cases
if t_idx == len(text) and p_idx == len(pattern):
return True
# Try multiple possibilities
if pattern[p_idx] == '*':
# Option A: skip wildcard
if match_pattern(text, pattern, t_idx, p_idx + 1):
return True
# Option B: consume character
if match_pattern(text, pattern, t_idx + 1, p_idx):
return True
# Continue matching
return match_pattern(text, pattern, t_idx + 1, p_idx + 1)
When to use: Pattern matching, exploring solution spaces
Complexity Characteristics
Time Complexity: - Linear patterns (first+rest, two-pointer, accumulator): O(n) - Backtracking patterns (wildcards): O(2^n) worst case
Space Complexity: - All recursive string functions: O(n) call stack depth - Accumulator with strings: O(nΒ²) if concatenating, O(n) if using lists
Key Optimization: Use list accumulation instead of string concatenation:
# Bad: O(nΒ²) time
def build_string(s, idx, result=""):
return build_string(s, idx+1, result + s[idx])
# Good: O(n) time
def build_string(s, idx, result=None):
if result is None:
result = []
result.append(s[idx])
return ''.join(result) # Only at the end
When to Use Recursion for Strings
Recursion is the right choice when: - β Problem has nested or branching structure (parentheses, wildcards) - β Multiple possibilities need exploration (pattern matching) - β Recursive definition is clearer than iterative (Luhn algorithm) - β Building composable, reusable validators
Iteration is better when:
- β Simple sequential processing (use for loops)
- β Built-in methods exist (str.find(), str.split())
- β Performance is critical and recursion adds overhead
- β Problem is naturally iterative (counting characters)
Debugging Recursive String Functions
Common failure modes:
- Infinite recursion: Forgot base case or base case never reached
-
Fix: Ensure every recursive call makes progress toward base case
-
Off-by-one errors: Wrong string slicing boundaries
-
Fix: Trace execution manually with small input
-
Incorrect string building: O(nΒ²) concatenation
-
Fix: Use list accumulator, join at end
-
Missing edge cases: Empty strings, single characters
- Fix: Test with "", "a", "ab" explicitly
Debugging workflow: 1. Add print statements showing parameters at each call 2. Trace execution manually with small input (3-5 characters) 3. Verify base cases are correct and reachable 4. Check that recursive calls make progress
Connection to Future Topics
The patterns we learned here will reappear throughout the course:
- Module 3 (Divide and Conquer): Two-pointer pattern extends to binary search
- Module 5 (Tree Recursion): First+rest pattern generalizes to tree traversal
- Module 6 (Memoization): Backtracking patterns benefit from caching
- Module 7 (Backtracking): Wildcard matching is a simple backtracking problem
Final Thoughts
String recursion teaches us that the right recursive pattern depends on the problem structure:
- Sequential problems β first + rest
- Symmetric problems β two pointers
- Building results β accumulator
- Exploring possibilities β backtracking
The key skill is recognizing which pattern fits the problem. With practice, this recognition becomes intuitive.
In the next module, we'll see how these patterns extend to more complex data structures, starting with divide-and-conquer algorithms that split problems in half rather than peeling off one element at a time.